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Abstract 

In this paper we propose a novel Bayesian methodology for Value-at-Risk compu- 
tation based on parametric Product Partition Models. Value-at-Risk is a standard 
tool to measure and control the market risk of an asset or a portfolio, and it is also 
required for regulatory purposes. Its popularity is partly due to the fact that it is an 
easily understood measure of risk. The use of Product Partition Models allows us 
to remain in a Normal setting even in presence of outlying points, and to obtain a 
closed-form expression for Value-at-Risk computation. We present and compare two 
different scenarios: a product partition structure on the vector of means and a prod- 
uct partition structure on the vector of variances. We apply our methodology to an 
Italian stock market data set from Mib30. The numerical results clearly show that 
Product Partition Models can be successfully exploited in order to quantify mar- 
ket risk exposure. The obtained Value-at-Risk estimates are in full agreement with 
Maximum Likelihood approaches, but our methodology provides richer information 
about the clustering structure of the data and the presence of outlying points. 
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1 Introduction 



Following the increase in financial uncertainty, there has been intensive re- 
search from financial institutions, regulators and academics to develop models 
for market risk evaluation. A common and easily understood measure of risk 
is Value-at-Risk (VaR). In particular, Basel accords impose that all financial 
institutions have to mee t capital requirements based on VaR estimates, see 
Basel Committee I (l2006h . 



VaR is defined as the maximum potential loss of an asset or a portfolio, at 
a given time horizon and significance level. An accurate estimate of VaR is 
important for both banks and regulators. An underestimation of risk could 
obviously cause problems for banks and other participants in financial markets 
(e.g. bankruptcy). On the other hand, an overestimation of risk may cause one 
to allocate too much capital as a cushion for risk exposures, having a negative 
effect on profits. The Committee does not prescribe banks a special type of 
model, leaving them free to specify their own model for VaR estimation. In 
the literature a wide range of models to measure VaR and to determine the 
level of regula t ory capital are described . For a review on VaR models see e.g. 



Jorion I (120011 ). iManganelli and Engle I (120041 ) and the remaining list of VaR 



contributions at the web site http://www.gloriamundi.org. 



In this paper we propose a novel Bayesian methodology for VaR estimation 
based on parametric Product Partition Models (PPMs), and we compare our 
results with those obtained with st andard approach e s based on Maximum 



Likelihood (ML) techniques, see e.g. iMina and Xiao I (120011 ). iMattedi et al 



( 120041 ). iBormetti et al. I (120071 ) . In our analysis we pay particular attention in 
evaluating the statistical uncertainty associated with different results; in fact 
a good risk management requires not only a pointwise VaR estimate but also 
an assessment of how much precise the estimate is. 



If the returns are independent and identically normally distributed a closed- 
form and easy to implement expression for VaR can be used. Unfortunately, 
these assumptions fail to be effective for low liquidity markets and short time 
horizons and have to be relaxed. Possible solutions are to resort to heavy tailed 
distributions or to abandon the hypothesis of identically distributed returns. 
In this paper we follow the latter approach and we use a Bayesian methodol- 
ogy based on parametric PPMs. We assume that the returns follow a Normal 
distribution with a partition structure on the parameters of interest. We as- 
sign a prior distribution on the space of all possible partitions and we identify 
clusters of returns sharing the same mean and variance values. Returns be- 
longing to different clusters are characterised by different values either of the 
mean or the variance. The hypothesis of identical distribution holds within but 
non between clusters. As a consequence we abandon the assumption of iden- 
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tical distribution while preserving the Normal setting. Furthermore, the use 
of a product partition approach allows us not only to accommodate anoma- 
lous observations but also provides as a by-pr o duct a useful to o l for their 



identification; s e e iQuintana and Iglesias I (120031 ) , I Quint ana et al. I (120051 1 and 



De Giuli et all (|2009|) for further details 



We propose and compare two different PPMs for VaR estimation. In the first 
one we impose a partition structure on the vector of means whereas the volatil- 
ity is a common random variable; in the second one we impose a partition 
structure on the vector of variances with common unknown mean. The first 
approach is quite effective for VaR estimation, but it is very sensitive to the 
values of prior parameters and even a hierarchical model can not reduce this 
sensitivity. This problem can be overcame by fixing the values of the hyperpa- 
rameters according to analysts' experience about the market behaviour. This 
drawback effect is strongly reduced by imposing a partition structure on the 
vector of variances. Our results are compare d with those obtained with the 



parametric PPM developed in lLoschi et al. I (120031 ) for the identification of 



change-points in financial time series. 



To obtain the posterior distribution of the quantity of interest we use Markov 
Chain Monte Carlo (MCMC) techniques. MCMC metho ds have a history 



i n ma thematical physics dating back to t he algorithm of Metropolis et al 



( 1953 ). later generalised by Hastings! ( 1970l ). In our work we extensively resort 



to a sp ecific type of Markov chain algorit hm, introduced by lGeman and Geman 
( Il984f ) and iGelfand and Smith I (Il990h and known as Gibbs sampler. The 
Gibbs sampling algorithms considered here are described in details in sec- 
tions [3J] and [3T21 For a recent Bayes ian application of Gibbs s ampli ng: in the 



context of financial analysis, see e.g. IChang and Feigenbaum I ( 120081 ). 



The paper is organized as follows. In section [2] we briefly introduce VaR as a 
measure of risk and parametric PPMs. In section [3] we present two models for 
VaR estimate and introduce a closed-form expression for VaR computation 
extending the usual Gaussian form. In section H] we describe how to exploit 
the clustering structure induced by PPMs in order to identify outlying points. 
In section [5] we apply our methodologies to a Mib30 data set and provide a 
sensitivity analysis of our results with respect to different choices of hyperpa- 
rameters. Section O closes the paper with some final remarks. 
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2 Background and Preliminaries 



2.1 Value-at-Risk 



VaR is referred to the probability of extreme losses due to adverse market 
movements. In particular, for a given significance level a (typically 1% or 
5%), VaR is defined as the maximum potential loss over a fixed time horizon 
of individual assets and portfolios of assets as well. In the following we focus 
on VaR for a single asset. 

If the returns are independent and identically normally distributed with mean 
\x and variance a 2 , a closed- form expression for VaR normalised to the spot 
price is given by 



A = + av /2 erfc" 1 (2a) 

Wq 



where A is VaR, Wq is the spot price and erfc -1 is the inverse of the com- 
plementary error function. In the following, with VaR we shall refer to the 
quantity A/Wo, if not specified otherwise. If this quantity is expressed in per- 
centage term we name it percentage VaR, VaR(%) . 

In order to estimate the parameters /i and a in equation (Tj[|), we apply a 
Bayesian approach based on parametric PPMs; the details are provided in the 
following section. 



2.2 Parametric Product Partition Models 



We now briefly review the theory of parametric PPMs with refere nce to our 



specifi c problem. For a detai l ed an d more general presentation see lHartigan 



( 119901 ). iBarrv and Hartigan I (119921 ). 



Let y = (yi, . . . , y t , . . . yr) denote the vector of returns of a generic asset at 
different time points t. The returns are independent, and jointly distributed 
with probability density function / parameterised by the vector (0,i/j). The 
elements of depend on the time point t, 6 = [6\, . . . , 6t), whereas ip is 
a parameter that is common to all observations. We consider the following 
model 



y\(0,i>) ~ f(y\0,i>), with yt'- f{yt\0tA) t=l,...,T. (2) 
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Given the model in (j2J), let Sq = {t : t = 1, . . . , T} be the set of all time 
periods. A partition of the set So, P — {^l, . . . , Sd, ■ ■ ■ , S\ p \} with cardinality 
|p|, is defined by the property that S d D — for d ^ d! and Sd = So- 
The generic element of p is Sd — {t ■ t — 9* d }, where <?* = (#*,..., 6* p ^j is 
the vector of the unique values of 6 = (9 1 , . . . , 6 T ). All t whose subscripts t 
belong to the same set Sd G p are (stochastically) equal, in this sense they are 
regarded as a single cluster. 

We assign to each partition p the following prior distribution 



P(p={S 1 ,...,S M })=Kf[C(S i ) , (3) 

d=l 

where C (Sd) is a cohesion function and K is the normalising constant. Equa- 
tion is referred to as the product distribution for partitions. The cohesions 
represent prior weights on group formation and formalise our opinion on how 
tightly clustered the elements of Sd would be. 

The cohesions can be specified in different ways. A useful choice is 



C(S d )=cx(\S d \- 1)1 



(4) 



where c is a positive constant and \Sd\ denotes the cardinality of the set Sd- 
For moderate values of c, e.g. c = 1, the cohesions in equation (j4j) yield a prior 
distribution that favours the formation of partitions with a red uced n umbe r 



of large subsets. For more d e tails on the choi c e of c see e.g. iLiu I (119 96'). 



(120081). 



Quintana and Iglesias I (120031 ). iQuintana et al. I (120051 ) and iTarantola et al. 



If non contiguous clusters are considered we can exploit an interesting con- 
nection between parametric PPMs and th e class of Bayesia n nonparametric 



models with a Dirichlet Process prior, see lAntoniak I (119741 ). Under the lat 



ter prior, the marginal distribution of the observables is a specific PPM with 



the co hesion functions specified by equation (jlj), see IQuintana and Iglesias 
( 120031 ). In this case we can use efficient Markov Chain Monte Carlo (MCMC) 
algorithms developed for Bayesian nonparametric problems. 



When dealing with contiguous blocks, as in the change-point problem, this 
connect ion cannot be explo ited, and specific MCMC algorithms are required, 



see e.g. lLoschi et al. I (120031 ). 
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3 VaR Computation via Product Partition Models 



Let y be the vector of daily returns of a generic asset. We assume that the 
returns are normally distributed with parameter vector (6, ip). We present and 
compare two different PPMs; in the first one we impose a partition structure 
on the vector of means, and in the second one we consider partitions on the 
vector of variances. In the following the PPM applied to the vector of means 
will be shortly referred to as the ^t-PPM approach, while cr 2 -PPM will refer 
to the PPM for the vector of variances. In ^t-PPM the vector is the vector 
of means while in cr 2 -PPM it corresponds to the vector of variances. In the 
former model ip is the variance and in the latter it corresponds to the mean. 

We consider the following hierarchical structure 



y t \(p, (9l,...,9l pl ),a 2 ) m r3- N{y t \{9 t ^)) , 

p ~ product distribution, with C(Sd) — c X {\Sd\ — 1)! , 
ip ~ g(ip) , 



where / and g denote generic density functions and the product distribution 
is defined in equation ([3]). 

The elicitation of a partition structure on the vector of means (or variances) 
allows us to remain in a Normal setting without assuming identical distribution 
of the returns. An alternative model that could be used to take into account 
atypical returns when estimating VaR consists of assu ming; t-distributed rather 



than N ormal data. This analysis has been performed by lQuintana and Iglesias 



(120031 ) in the context of regression models, showing that parametric PPMs in 
a Normal setting are even more effective if the purpose is to deal and identify 
extreme values. 



In sect ions 13 . 1 1 and [3 . 2 1 we describe in details our models and in section [3731 we 
propose a closed-form expression for VaR computation. 



3. 1 Product Partition Models on Vector of Means 



In the /x-PPM approach we impose a partition structure on the vector of 
means fi = (pi, . . . ,pr)- By inducing a cluster structure on the vector fi we 
try to accommodate for atypical y t values. In order to achieve this goal we use 
the following hierarchical model 
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Vt\{p, {lA.,---,H*\p\),° 2 ) ~* N(fi t ,a 2 ) , 

K,...,/if p| |(p,a 2 )^iV(m,r V 2 ), (5) 
p ~ product distribution, with C(Sd) = c X (\Sd\ — 1)! , 
a 2 ~ JG(z/ , A ) , 

where fx* = (//*, . . . , /i*^) is the vector of all entries of fx for a given partition 
p, and 7G(z/o, Ao) is an Inverted Gamma distribution with E [a 2 ] = Ao/(t / o — 1), 
z/ > 1 and A > 0. 

The complete joint distribution for the model is given by 



I pi 

f(y,P,P= {S u ...,S lpl },a 2 ) oc exp <j -— J2 Y (.Vt ~ Pdf 

la d=i tes d 



IpI All |p| 

x exp < -^EW - raf - -f .^-^^M - 1), . 



To fit this model we adapt an algorithm proposed by iBush and MacEachern 



( 119961 ) in the context of Bayesian nonparametric inference. Once a starting 
value for the vector fx has been provided, we iteratively sample from the 
joint posterior distribution of model and parameters by means of the Gibbs 
algorithm described below. 

Step (i) : Sample a 2 from its full conditional distribution 

f T IpI 1 |p| 1 T 

a 2 \n, y ~ IG I z/ + - + A + Y^d ~ m) 2 + - YiVt ~ PtY 



Step (ii) : Update each p t , t = 1, . . . , T, by sampling from the mixture 
Mt|/x_ t , a , y ~ 2^ + 9*o x ^ 1,2 ' Tl~^i 



(6) 



where is obtained from fj, by removing the t-th entry and 5^ (/^) is the 
Dirac delta centered on p f . 

The distribution in equation corresponds to a mixture of point masses 
and a Normal distribution, with weights 



q tj ocexp j-^-^-/!,-) 2 ! 



q t0 oc exp {-(y f - m) 2 /[2a 2 (l + r 2 )]} 
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Step (iii) : Before proceeding to the next Gibbs iteration we update the vector /j,* 
given the partition p, sampling from 



a 



N ( Etes d yt + m/T$ 

{ \S d \ + l/r 2 + 



d = 1, . . ., \p\. 



This last step was introduced in iBush and MacEachern I (119961 ) to avoid 
being trapped in sticky patches in the Markov space. 



The weights q t j represent the finite probability of replacing p t with a value pj 
already belonging to the vector of means. On the other hand q t Q represents 
the finite probability of replacing the old pt value with a newly sampled one. 
It is worth noticing again the role played by the constant c. A greater value 
of c increases the probability to generate new values. Generally, the higher c 
is, the higher the probability to obtain an elevate number of clusters will be. 

As it turns out from the empirical analysis, see section [5j posterior distri- 
butions are quite sensitive to the value of the parameter Ao of the Inverted 
Gamma distribution in the hierarchical model defined in We tried to re- 
duce this drawback effect by introducing a hyperprior distribution on Ao- This 
translates into a minor modification of model (J5]) 



Vt\{p, • • • ^m)- " 2 ; V) ~' N(p tl a 2 ) 



\p\ 

V*i:---^*\p\\(p^ 2 Ao) N(m,T$a 2 ) , (7) 
p ~ product distribution , 
a 2 |A ~ IG(v , A ) , 
A ~ G(t),4>) , 

where G(r), (f) is a Gamma distribution with E [Ao] = rj<ft, rj > and <p > 0. The 
previous Gibbs sampling algorithm must be modified coherently. Now we have 
to provide a starting point for a 2 too, while Step (i) splits in two sub-steps: 



Step (ia) 



Step (ib) 



A |/x, y,a 2 ~G (u + rj, J 2 + ^ ) • ( 8 ) 



\p\ T i \p\ i t 

a 2 \n, y, A ~ IG [ u + — + -, A + — ^ - m) 2 + - ^(y t - p t ) 2 

z z AT o d=l A t=i 



Step (ii) and Step (iii) do not change. 
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3.2 Product Partition Models on Vector of Variances 



An alternative way to relax the hypothesis of identical distribution of the 
returns, without renouncing to the normality assumption, is to promote the 
variance er 2 from a scalar to the vectorial quantity er 2 = (er 2 , . . . , erf.) and to 
impose a clustering structure on er 2 . Our aim is to create clusters of obser- 
vations, not necessarily contiguous in time, sharing the same value a 2 * d of the 
variance. 

We consider the following hierarchical model 



yt 1 


(>, (* 2 






1 ( 


y ; . . 






o- 1 





*T„|),P (9) 



^,),p)~ivL, 



i.i.d. 



t(^ - 1; 



JG(z/ , A ) , 

p ~ product distribution, with C(Sd) — c x (|jSy — 1)! , 

with the variance of the Normal prior over /x equal to Ao/[T(i^ — 1)], where 
Ao/(^o — 1) is the first moment of IG(u , A ) and 1/T ia a scaling factor. 

The joint distribution is given by 



f(y,V,P= {Si,---,S| P |},<7 2 ) oc exp |- T 1 (/i - m) 2 j 

x«p(~ E E ^ - £ n (i4i - 1)! 

I z d=i tes d °d d=i a d I d=i 



In order to sample from the posterior distribution of the model and parameters 
we use a Gibbs algorithm that is a generalization of the one used in the section 
13.11 The algorithm consists of the three steps below. 



Step (i) : Sample \x from its full conditional distribution 
ii|cr 2 , y rsj N 



1 I V"*'"' I C I Ap 1 I V^l^l CM Ap 

y J- -I- 2^d=l \°d\ t(i/ -1)ct 2 2 1 <^d=l l° d lT(^o-l)<T 2 d 

Step (ii) : Update each a 2 , t — 1, . . . , T, by sampling from the mixture 

er 2 |<r 2 _,, y ~ £ ) + &> x + \ Ao + , (10) 
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where a 2 _ t is obtained from cr 2 by removing £-th entry and 8^2 {of) is the 
Dirac delta centered on of 

The distribution in equation (1101) corresponds to a mixture of point masses 
and an Inverted Gamma distribution, with weights 



Qtj oc 



fat -Mr 



qtocccx 



^0 



2^ + 5(A ) l/ ° 



r(n>) 

9to = 1, 



2AnP +3 



where T is the Euler Gamma function. 
Step (iii) : In order to avoid being trapped in sticky regions of the Markov space, 
resample erj* from 



2* 



IG\u 



An + 



E 

tGS d 



(Vt - lif 



d 



A well-known stylized fact about volatilities is the bursting effect and PPMs 
can be exploited to identify change point s in volatility t ime s er ies. This pro- 



blem has been extensive l y con sidered by lLoschi et al. I (120031 ). lLoschi et al 



(120071 ) and lLoschi et al. I (120081 ). Although we do not focus on this aspect here, 
in the empirical analysi s in section[S]we shall use the results from the algorithm 
by lLoschi et al. I (120031 ). labelled er 2 -CP, as a yardstick to be compared with 
our numerical results. 



3.3 VaR Estimation 



We now present how the posterior distribution of VaR and consequently its 
Bayesian estimate can be obtained by using the output of the MCMC algo- 
rithms described in sections 13.11 and 13.21 

First we focus our attention on the PPM on the vector of means. Let indicate 
with n*^ = ^u*^), . . . j/^i^ an d vfe) respectively the vector of means and 

the variance sampled at the £-th iteration of the Gibbs algorithm. At each 
iteration we obtain a peculiar clustering structure. All returns share the same 
value of afy , but each cluster is characterized by a different value n* d ^ . In order 
to provide a single VaR estimate for each iteration of the chain we propose to 
combine the different entries of by means of an arithmetic average and 
we consider the following equation 
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E^^ W +^)V / 2erfc- 1 (2a). (11) 



It is worth noticing that for trivial partitions, i.e. \p\ = 1, equation ffTTl) reduces 
to the usual expression given in equation (pP). 

If we impose a clustering structure over the vector of variances, VaR can be 
computed in an analogous way but the arithmetic average is performed over 
different values of cr*j^, that is 



^ = ~m + E %^ *S W V2 erfc- 1 (2a) . (12) 
In this case all returns share the same value of [im but each cluster is charac- 



terised by a different value of o d 



2* 



(.(■)■ 



The resulting VaR estimate is obtained as the ergodic mean of the quantities 
A(£) in (HI]) or ([H]) for yti-PPM or <r 2 -PPM respectively: 



A = Iv*sa. d3) 

W L ^ Wo v ' 



Finally, VaR under the cr 2 -CP model is computed in a similar way via equation 

(HI. " 



4 Product Partition and Outliers Identification 



PPMs can be a useful tool for outliers identification. Following lQuintana and Iglesias 



(120031 ). we work in a Bayesian decision theoretical framework and we propose 
an efficient algorithm for outliers identification. We model outliers as a shift 
in the mean of the data and consequently we fix our attention on /x-PPM. 
The extent of that shift is indeed the criterion used by this model to induce 
a new cluster on the vector of returns, as emerge from the expression of the 
weights q tj and q t0 . 

Our aim is to select the partition that best separates the main group of stan- 
dard observations from one or more groups of atypical data. Each partition 
corresponds to a different model, and the best model is the one minimising a 
given loss function. Let (fj,, a 2 ) be the vector of parameters of the model and 
(fjtp, a 2 ,) the corresponding vector that results when fixing p. We consider the 
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following loss function that combines the estimation of the parameters and 
the partition selection problems 



L(p,^ P ,o- 2 p ,^,cr 2 ) = — || n p -/z f +k 2 (a 2 p - a 2 ) 2 + (1 - ki - k 2 )\p\, (14) 



where || • || is the Euclidean norm and ki, k 2 are non-negative cost-complexity 
parameters with k\ + k 2 < 1. 



Minimising the expected value of ( 1141) is equivalent to choosing the partition 
that minimises the following score function 

sc( P ) = ^ || ti B {y)-ii p {y) f +h [a 2 B (y)-^ P (y)] 2 +a-h-k 2 )\pl (is) 

where the subscript "5" means that we consider the Bayesian estimates of 
the corresponding parameter whereas the subscript "p" indicates the estimate 
conditionally on a given partition p. Formally, we have that Ab(?/) = ^ [a*| j/] , 
fip{y) = E[/Lt|y, p] and analogously for cr 2 B {y) and cr 2 p {y). The Bayesian esti- 
mates are obtained via the Gibbs sampling algorithm described in the section 
13.11 The evaluation of fi p {y) and (J 2 {y) also requires the use of the Gibbs 
sampling scheme, but in a structurally simpler version. Indeed the partition 
p is fixed and we can sample from the joint posterior distribution performing 
iteratively Step (i) and Step (iii), but skipping Step (ii). 

An exhaustive search on the space of all possible partitions is infeasible. In fact, 
for a set with T elements, the number of all possible partitions is equal to B(T), 
the Bell number of order T, recursively defined by B(T + 1) = J2k=o 
with B(0) = 1. This quantity is extremely large even for moderate values of a, 
therefore we need to restrict our search to a tractable subset of all partitions. 



In order to find the minimum of the score function in equation (1151) . we perform 
an exhaustive search over the partitions with cardinality up to three, selected 
as follow. 

i) Let fi B = (pi, . . . , pt) be the vector of the Bayesian estimates of the returns 
means, and jl B = (pi, . . . ,p~) be the vector of the unique entries of pL B 
sorted in increasing order, with p-y = min (pt), pf = max (p t ), and T <T. 

ii) We perform our search of the optimal partition over the set of the partitions 
p = {Si, S 2 , S 3 }, where S x = {t : p t < pi}, S 2 = {t : p { < p t < pj}, and 
5*3 = {t : p t > p,j} with i, j = 1, . . . ,T. We select as optimal partition the 
one for which the score function achieves the minimum value in (|T5i) . 

When i ^ 1 and j ^ T, p is a genuine cardinality-3 partition. The indexes in Si 
and S3 may be considered as representative of those returns being in the "left 
tail" and the "right tail" of the empirical distribution of y. S 2 corresponds 
to elements occupying the central region of this distribution. When i = 1 
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and j = T, we are exploring the trivial partition, Si = S3 — 0. If i — 1 
or j = T, partitions have just two clusters. However, there is an alternative 
way to generate cardinality-2 partitions. Given every cardinality-3 partition 
p, we consider the new partition p = {Si, S 2 }, with Si = Si U S3. This step is 
necessary for our search to be exhaustive also over the space of cardinality-2 
partitions. 

Once the optimal partition has been found, we identify the outliers with those 
elements in y whose indexes belong to the sets with lowest cardinality. 



5 Empirical Analysis of Financial Data 



5.1 The Data 



The methodologies described in the previous sections are now illustrated and 
tested over the MIB30 index and its three components with the highest excess 
of kurtosis, where standard approaches based on Normal distributions usually 
fail. In particular we apply our analysis to the Italian assets Lottomatica 
(LTO.MI), Mediobanca (MB. MI) and Snam Rete Gas (SRG.MI). We consider 
time series of daily returns from April 2004 to March 2008. All time series are 
made of 1000 daily returns. The data are freely downloadable from the site 
http://it.finance.yahoo.com. 



5.2 Choice of Hyperparameters and Computational Details 



In the examples below we use the following values of the hyperparameters. 
In models ([5]), ([7]) and (Q we set m = 0, while Tq = 10 3 in ([5]). The choice 
for m can be motivated by the fact that in VaR estimation for short time 
horizon, typically from one day until one week, the value of the mean is usually 
neglected, see e.g. lMina and Xiao I (120011 ). In the Inverted Gamma distribution 
we set Ao = 0.0101 and v§ = 2.01. With these choices we have prior expectation 
and variance 0.01 for a 2 , reflecting what is known from the past experience 
about the volatility behaviour for equity assets. The value of c that controls 
the clustering structure over the vector of parameters is set to 1, in order 
to favour the creation of a small number of large clusters. As far as concern 
the score function parameters of equation fTi3]) we set ki ~ 0.996 and k 2 ~ 
0.002, giving priority to the estimation of /x, and imposing little restriction 
on the estimation of the other parameters. For the cr 2 -CP model that we use 
as yard stick model we set the priors' parameters following the suggestions 
given in lLoschi et al. I ( 2003 ). In particular we consider the conjugate Normal- 
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Inverted-Gamma model, with the probability p that a change occurs at any 
instant in the sequence equal to 0.1. 



The programs are written in Fortran 77 language, with basic function of linear 
algebra provided by BLAS and SLATEC libraries. Random number genera- 
tors, Normal and Gamma sampling are based on the algorithms implemented 
in the RANDOM library. The interested reader can download them and find 
more detail browsing the Netlib repositor y at http://www.netlib.org. The al- 



gorithm proposed by lLoschi et al. I (120031 ) is freely available at the web site 



ftp:/ /ftp. est. ufmg. br/pub/loschi/. 

We run the MCMC algorithms with 10000 sweeps and a burn-in equal to 
1000. Convergence of the MCMC a lgorithm is ass essed using diagnostics im- 



plemented in the package BOA, see ISmith I (120011 ). All the numerical compu 



tations are performed with an AMD Athlon 64 X2 3800 2.0 GHz processor 
and 2.0 GByte of RAM, OS Gentoo Linux kernel 2.6.22. Each program takes 
nearly 15 minutes to generate the ergodic sample and to compute the pa- 
rameters posterior distributions. The clustering structure for each step of the 
chain and the relative frequencies of the partitions are computed by means of 
sorting algorithms. It takes further 10 minutes to accomplish this task. In our 
programs we use sorting algorithms implementing strategies of 0(T 2 ) com- 
putational complexity. It is possible to reduce the computational burden by 
means of O(TlogT) algorithms. However it is crucial that sorting preserves 
relative order of records with equal keys, but this in general requires storage 
of an auxiliary amount of memory. 



5.3 VaR Results 



In table [T] we report Bayesian estimates of percentage VaR for a = 1% and 
a = 5% and the 68% posterior credible interval. 

TABLE □] ABOUT HERE 

The estimates of VaR obtained with cr 2 -PPM and er 2 -CP are in good agree- 
ment even if the two approaches are quite different in spirit. The former ap- 
proach is a natural extension of the ^t-PPM to the vector of variances while 
the latter one is specific for change point identification. 

The PPM on the vector of means in general underestimates VaR with respect 
to the values given by the PPMs applied to the variances. This fact can be 
empirically justified noticing that for daily time horizons the contribution to 
VaR due to the volatility a is of order ten greater than that due to the mean 

fJL. 
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Figure [T] depicts posteriors distributions for VaR estimates at level a = 1%. In 
the first row we present the results based on the /x-PPM approach, while the 
second corresponds to cr 2 -PPM. The posterior distribution of VaR presents a 
higher variability under the <x 2 -PPM approach than under the ^t-PPM VaR 
one. 



The posterior expectation of the number of clusters is low for both the ^t-PPM 
and er 2 -PPM approaches and, moreover, the partitions are characterised by a 
very large cluster and few small ones. The results are presented in table [2J 



TABLE M ABOUT HERE 



The arithmetic average in equations (fTT|) and (fT2|) is therefore dominated by 
the values of n* d ^ and that correspond to the largest cluster, while 

outlying clusters introduce corrections to VaR. 



FIGURE CD ABOUT HERE 



We now compare our results with those obtained with standard parametric 
approaches based on ML estimators for the mean and variance. In particular 
we consider results obtained with a No rmal model and with t he generalised 
Student-t (GST) distribution, see e.g. iBormetti et al. I (120071 ). In the GST 
we set the tail index v > 2, in order to keep the variance finite, see last 
column of table O In the following we consider the GST as the benchmark 
for our analysis sinc e it pre sents a good agreement with historical simulations, 
see 



Bormetti et al. 



(120071 ). For the daily returns under study we report in 
figure [2] ML estimates and their 68% confidence intervals computed from the 
cumulative function obtained generating 1000 bootstrap copies of the original 
time series. Numerical details are reported in tables [T] and [3J The solid line in 
figure [2] joints the estimated values of VaR(%) while the dashed lines connects 
the boundaries of the 68% credible/bootstrap intervals. 



TABLE |3] ABOUT HERE 
FIGURE m ABOUT HERE 



At a = 1% the results obtained with cr 2 -PPM and cr 2 -CP are the ones in 
best agreement with the GST distribution, while Normal and ^t-PPM un- 
derestimate VaR. The situation is different if we consider a = 5%. In this 
case /x-PPM is the only one in agreement with the GST distribution, while 
cr 2 -PPM and cr 2 -CP overestimate VaR. 



For time horizons longer than one-day, we focus mainly on the 10-day hold- 
ing period, as required by Basel Committee for the computation of regulatory 
VaR. Indeed, the Committee prescribes the following formula for the calcula- 
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tion of regulatory capital for market risk 



MRC t = max (± £ A™(10), A°- 01 (10) 



where MRCt is the market risk capital at time t, A^^IO) is VaR (not nor- 
malized by Wo, see equation ([I])) at a = 1% for 10-day ahead computed using 
past returns up to time t — i and h is a penal ty multiplier ranging fro m 3 



to 4, fixed according to the traffic light rule, see iBasel Committee I ( 120061 ) for 
more details. From the original daily returns time series we compute the series 
of non-overlapping 10-day returns to have the 10-day VaR forecast for both 
//-PPM and cr 2 -PPM. This approach can be easily generalised to an arbitrary 
holding period. 

TABLE 1 ABOUT HERE 

In table [4] we report the estimated 10-day ahead VaR(%) for the standard 
significance levels a = 1% and a = 5% with their 68% credible intervals. 
Basel regulations recommend to use the so-called square-root-of-time rule to 
ob tain the 10-day VaR from the one-day VaR. However, as already pointed out 
in iDamelsson et al. I (119981 ). this rule strongly depends on the assumption of 
normally i.i.d. returns. The ratio between 10-day and one-day VaR estimated 
with parametric PPMs is readily computed and indeed our results confirm a 
statistically significative violation of the square root scaling law, thus high- 
lighting the ability of our approach to better capture the properties of returns 
time series. 



5.4 Sensitivity Analysis and Outliers Detection 



Although the choices of parameters value used in the previous sections rep- 
resent our prior knowledge and beliefs about the problem, it is illustrative to 
assess the sensitivity of the results to other choices of the hyperparameters. 

We first consider the dependence of VaR estimates on the value of the c in the 
cohesion function in In figure [3] we plot the results for the /it-PPM model 
for a = 1% and c = 0.1, 0.5, 1, 5, 10, 50. 

FIGURE H ABOUT HERE 

Note that for MIB30 and SRG.MI, the results are remarkably robust for a 
wide range of values of c. For MB. MI and LTO.MI the estimated value of VaR 
exhibits a slightly decreasing trend. 

To study the sensitivity of our results to the parameters A and u of the 
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Inverted Gamma distribution it is convenient to re-express them in terms of a 
common parameter a. We set Ao = a(a + 1) and v = 2 + a in order to obtain 
prior expectation and variance for a 2 both equal to a. In figure H] we present 
the results for a = 0.0001,0.001,0.01,0.1,1. For a = 1 we have completely 
out-of-scale results. 



FIGURES ABOUT HERE 



In this paper we use a = 0.01, reflecting past knowledge regarding the problem 
at hand. For this reason we focus on a region around this value. The high- 
est stability is reached when the PPM approach is applied to the vector of 
variances. In fact for a < 0.01 the results within the 68% credible intervals 
are almost identical. The ^t-PPM is less stable. These results are confirmed in 
figure where we plot the posterior distributions for LTO.MI a = 1% VaR, 
with a = 0.0001, 0.001, 0.01, 0.1. 



FIGURE E ABOUT HERE 



We note that for a = 0.0001 and a = 0.001 the distributions obtained with 
the cr 2 -PPM are almost overlapping. A similar behaviour is observed for the 
other three time series. 



We also explored separately the role played by Ao and uq and we found that 
A assumes a crucial role. We than tested the effects of an hyperprior over 
the scale parameter Ao- We considered various combination of the 77 and 4> 
parameters, as given in equation ([8]). For all the tested values we were not 
able to achieve a reasonable sensitivity reduction. For the sake of parsimony 
we do not report here our results. 



FIGURE M ABOUT HERE 



In order to identify outlying points we apply the procedure described in section 
HI The results are reported in figure [6j Returns corresponding to atypical 
values are represented by a small triangle (gains) or a small circle (losses). 
Their identification represents a by-product result of our approach to VaR 
computation. It could be interesting to investigate the economical reasons 
responsible for the anomalo u s fluc tuations of assets price, along the same lines 
depicted in |Pe Giuli et al. I (120091 ) . Finally it is interesting to investigate the 
stability of our procedure with respect to the value of c. Table [5] summarizes 
the results of our analysis when increasing c from 0.1 until 50. 



TABLE E ABOUT HERE 



The outliers identification algorithm appears to be quite stable. As expected, 
on average the number of outlying points increases for increasing values of c. 
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5.5 Backtesting Procedures 



The current internal model verification procedure of the Basel II framework 
consists of recording the daily exceptions of the 1% VaR over the last year. 
We apply standard coverage tests to assess the accuracy of our VaR model; in 
particular we consider the unconditional cove rage (UC) test bylKupiec I (119951 ) 



and the conditional coverage (CC) one by IChristoffersen I (119981 ). Kupiec's 



test focuses on whether the actual number of VaR exceptions is equal to their 
expected number. Assuming that the probability of observing an exception 
is p, the number of exceptions out of a sample of iV observation follows a 
Binomial distribution Bin(N,p). The null hypothesis p = a can be assessed 
by using the following generalised likelihood ratio test 



LR 



uc 



-2 log 



a) N ~ n a n 



2 log 



n/N) N - n {n/N) n 



where n is the observed number of exceptions. This quantity is asymptotically 
distributed chi-square with one degree of freedom under the null hypothesis, 
and allows us to reject the model at 5% significance level when LRuc > 3.84. 
The LRuc can be extended to test the serial independence of deviations, intro- 
ducing a deviation indicator which is equal to if VaR is not exceeded and 1 
otherwise. We consider the following combined test statistics (Christoffersen's 
test) 



LRcc — LRuc 
LRind = -2 log 
+ 2 log 



LRlND 

[1 - n/N) Noo+Nlo {n/N) Nol+Nn 







7Tl 



where Ny is the number of days in which state j occurred in one day while was i 
the previous day, and 7Tj the probability of observing an exception conditional 
on the state i the previous day, that is 7r = N 01 /(N 00 + iV 01 ) and iii = 
Nn/(N W + Nn). The null hypothesis for the independence test states that 
the violation occurred one day does not depend upon the indicator state the 
previous day. Under this hypothesis, the LRcc statistics is distributed chi- 
square with two degrees of freedom and the VaR model will be rejected at 5% 
significance level if LR C c > 5.99. 

We perform the validation tests described above to all our data series, using 
a rolling window of returns to compute the VaR estimate by our models and 
comparing this estimate with the realized return. More precisely, at each stage 
J = 1, . . . , N, our Gibbs sampling algorithms compute the ex ante VaR esti- 
mate VaR1 IAXj using the returns yi with i = J, . . . , MAXj; then we check 
VaR%[ AXj against the ex post realized return Dmaxj+i- An exception occurs 
when Dmaxj+1 < -^ai?2/M,- A state indicator Ij is set equal to 1 if we reg- 
ister an exception, and equal to otherwise. This way we obtain the numbers 



18 



n and A™ needed to compute the LRcc statistics. 



TABLE E ABOUT HERE 



Choosing MAXj = J + 744, we are able to use all the information from our 
original series of 1000 returns and to obtain A" = 255 VaR estimates, roughly 
corresponding to one trading year. In table [6] are reported the results for the 
LTO.MI series. Our VaR models perform reasonably well with respect to both 
Kupiec's and Christoffersen's tests; the only exception is represented by the 
er 2 -PPM model with a = 5%, which produced n = 5 exceptions, fairly low 
with respect to the expected number E [n] = 255 x 0.05 ~ 13. The reason of 
this pitfall has to be located in the behaviour of our returns series; actually, 
an empirical study of that series has shown that the associated high frequency 
volatility decreases almost monotonously with time. Since the algorithm is 
trained with those returns corresponding to the high volatility regime and is 
tested against returns in a low volatility regime, this consequently results in a 
quite conservative evaluation of VaR. A similar behaviour is noticed also for 
the other series. 



6 Concluding Remarks and Future Research 



In this paper we have presented a novel Bayesian methodology for VaR com- 
putation based on parametric PPMs. The main advantages of our approach 
are that it allows us to remain in the Normal setting, to identify anomalous 
observations and to obtain a closed-form expression for the VaR measure. 
This expression generalizes the standard parametric formula that is used in 
the literature under the normality assumption. By means of PPMs we induce 
a clustering structure over the vector of means (/x-PPM) and we find the best 
agreement with ML approaches for significance level of order 5%. For lower 
values of a we obtained the best result by applying the PPMs to the vector 
of variances (cr 2 -PPM). 



We are currently working on the extension of the cr 2 -PPM approach to the 
portfolio analysis. The increase in the number of assets translates into an 
augmented dimensionality of the problem. In fact, the vector of variances is 
now replaced by the vector of covariance matrices. In order to reduce the 
number of involved pa r amete r s we are explo r ing se veral filtering techniques, 
see e. g. 
fl2007h . 



Laloux et al. I (Il999l ). iPlerou et al. I (119991 ) . and iTumminello et al. 
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Table 1 

Daily estimated VaR (%) values at 5% and 1% significance level with 68% credible 
intervals. 



VaR(%) 


a=5% 


a=l% 




/X-PPM (7 2 -PPM cr 2 -CP 


/i-PPM cr 2 -PPM <x 2 -CP 



MIB30.MI 


1-451851 


1 74+O.H 
L - '^-0.12 


1.76 


fO.Ol 
-0.01 


9 n7 +0.06 
z - u '-0.06 


2.4 8 ;°i? 


2 40+0-01 
z -^ y -o.oi 


LTO.MI 


2.0812:8? 


9 7 o+0.15 
2 -' 8 -0.16 


2.66; 


f0.02 
-0.02 


9 qc;+0.09 
z - yd -0.09 


3.94+^1 


o 70+O.O3 
°- '"-0.03 


MB. MI 


1 qi+0.07 
± - y± — 0.08 


2 40 +0 ' 12 

z -^ u -0.12 


2.36 


fO.Ol 
-0.01 


9 79+0.11 
z - ' z -0.11 


o 4n +0.17 


q qC+0.02 

°- o,J -0.02 


SRG.MI 


i co+0.05 


1 O7+0.12 
i - y ' -0.13 


2.01 


fO.Ol 
-0.01 


9 9fi +0.06 
z - zu -0.06 


9 01 +0.17 


9 07+O.O2 
z -°'-0.02 



Table 2 

Posterior mean of the number of clusters and relative weight of the largest cluster 
for /j-PPM and <x 2 -PPM. 



Number of Clusters Largest Cluster Weight 





H-PPM 


<X 2 -PPM 


jz-PPM 


<X 2 -PPM 


MIB30.MI 


3.11 


3.39 


0.986 


0.990 


LTO.MI 


5.02 


4.52 


0.963 


0.944 


MB. MI 


4.11 


3.72 


0.968 


0.970 


SRG.MI 


3.44 


3.59 


0.984 


0.978 



Table 3 

Daily ML estimated VaR(%) values at 5% and 1% significance level with 0070 
bootstrap intervals. In the last column we report central value and 68% bootstrap 
interval for the tail index v of the GST. 



VaR(%) 



Q- 



--1% 



Normal Student-/: Normal Student-t 



MIB30.MI 
LTO.MI 
MB. MI 
SRG.MI 



1.38 
2.50 
2.07 



1.62 



+0.05 
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h0.06 
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+0.01 
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2.15; 
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2.22 
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Table 4 

Estimated 10-day VaR(%) values at 5% and 1% significance level with 68% credible 
intervals. 

VaR(%) a=5% a=l% 



/i-PPM cr 2 -PPM /i-PPM (7 2 -PPM 



MIB30.MI 


4 iq+0.41 


4 4Q+0-46 
^°-0.46 




6.33^ 


-0.59 

0.61 


LTO.MI 


n oc+0.69 

D - oo -0 .70 


' - oy -0.85 


9.0818* 


10.90 


fl.ll 

-1.13 


MB. MI 


D - oo -0.65 


7 no+0.70 


41+O.82 
y - 4i -0.80 


10.07 


f0.92 
-0.91 


SRG.MI 


4 40+0.47 

^•^-0.47 


4 Oq+0.54 

4 -^-0.55 


6 60 +0 - 59 

D - DU -0.59 


7.06j 


-0.70 
0.72 



Figure 1. VaR posterior distribution for a = 1%. 
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Figure 2. Comparison between classical and Bayesian estimates of VaR(%). We 
consider two values for the levels a = 1% (top panel) and a = 5% (lower panel). 
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Figure 3. Sensitivity of a = 1% VaR(%) estimates for the /x-PPM model with 
respect to the value of the hyperparameter c in the cohesion function The other 
hyperparameters assume the values quoted in the main text. 
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Figure 4. Sensitivity of a = 1% VaR(%) estimates with respect to the value of the 
hyperparameters Ao = a(a + 1) and z^o = 2 + a. The other hyperparameters assume 
the values quoted in the main text. 
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Figure 5. Posterior distributions for VaR at level a = 1% for Lottomatica as a 
function of a. The other hyperparameters assume the values quoted in the main 
text. 
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Table 5. Sensitivity analysis of outliers detection with respect to the value of c. a, \q, i/q assume the values quoted in the main text. 
Subscripts 1^,3 mean the corresponding cluster being equal to Si, S2, and S3 respectively (see section H]). 



c MIB30 

0.1 {(46,48,155,476)i(37,275) 2 } 

0.5 {(46, 48, 155, 476)i(37, 275, 479, 761) 2 } 

1 {(37, 46, 48, 155, 275, 476)i(5, 45) 3 } 

5 {(37, 46, 48, 155, 275, 476)i(5, 45) 3 } 

10 {(37, 46, 48, 155, 275, 476) i(5, 45) 3 } 

50 {(37, 46, 48, 155, 275, 476, 479, 761)i(5, 32, 45) 3 } 



SRG.MI 

{(6, 7, 46, 47, 108, 155, 271, 782)i(3, 138, 164, 633, 827) 3 } 
{(6, 7, 46, 47, 108, 155, 271, 782)i(3, 138, 164, 633, 827) 3 } 
{(6, 7, 46, 47, 108, 155, 271, 782)i(3, 138, 164, 633, 827) 3 } 
{(6, 7, 46, 47, 108, 155, 271, 782)i(3, 138, 164, 633, 827) 3 } 
{(6, 7, 46, 47, 108, 155, 271, 782)i(3, 138, 164, 633, 827) 3 } 
{(6, 7, 46, 47, 108, 155, 271, 782)i(3, 138, 164, 633, 827) 3 } 



c LTO.MI 

0.1 {(7, 11, 422, 464, 742, 787)i(2, 51, 131, 227, 326, 461, 551, 553, 664, 679, 733, 777, 794) 3 } 

0.5 {(7, 11, 422, 464, 742, 787)i(2, 51, 131, 227, 326, 461, 551, 553, 664, 679, 733, 777, 794) 3 } 

1 {(7, 11, 422, 464, 742, 787)i(2, 51, 131, 227, 326, 461, 551, 553, 664, 679, 724, 733, 777, 794) 3 } 

5 {(7, 11, 422, 464, 742, 787)i(2, 51, 131, 227, 326, 461, 551, 553, 664, 679, 724, 733, 777, 794) 3 } 

10 {(7, 11, 422, 464, 742, 787)i(2, 51, 131, 227, 326, 461, 551, 553, 664, 679, 724, 733, 777, 794) 3 } 

50 {(7, 11, 110, 134, 155, 181, 189, 291, 422, 464, 742, 787)i 

(2, 51, 131, 227, 326, 461, 551, 553, 664, 679, 724, 733, 777, 794) 3 } 
c MB. MI 

0.1 {(48, 106, 237, 370, 515, 530, 540, 573) 2 (5, 8, 161, 403, 667, 668, 711, 722) 3 } 

0.5 {(9, 49)i(5, 8, 48, 106, 161, 237, 370, 403, 515, 530, 540, 573, 667, 668, 711, 722) 3 } 

1 {(9, 49)i(5, 8, 48, 106, 161, 237, 370, 403, 515, 530, 540, 573, 667, 668, 711, 722) 3 } 

5 {(9, 49, 721)i(5, 8, 48, 106, 161, 237, 370, 403, 515, 530, 540, 573, 667, 668, 711, 722, 813) 3 } 

10 {(9, 49, 721)i(5, 8, 48, 106, 161, 237, 370, 403, 515, 530, 540, 573, 667, 668, 711, 722, 813) 3 } 

50 {(9, 49, 721)i(5, 8, 46, 48, 106, 161, 237, 370, 403, 428, 515, 530, 540, 573, 667, 668, 710, 711, 722, 813) 3 } 



Table 6 

Backtesting results: the model is rejected at 5% significance level if LRuc > 3.84 
(unconditional coverage test), or LRcc > 5.99 (conditional coverage test). 



LTO.MI 


a= 


■■1% 




a= 


5% 






# Exceptions 


LRuc 


LRcc 


# Exceptions 


LRuc 


LRcc 


/Lt-PPM 


5 


1.857 


2.057 


9 


1.288 


1.947 


<T 2 -PPM 


1 


1.237 


1.245 


5 


13.873 


14.073 
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